This data set appears to be encoded using mac_roman, not utf-8. So Python will throw errors unless you open the file like this:
open(sys.argv, "r", encoding="mac_roman")
Also, seems to be a bad entry on line 43924. "cowardice" is duplicated but the e is shifted on the duplicate, which might mess up your data structures if you're not careful:
The data set is fine other than that.
November 16, 2018 Subject:
Problem with encoding
While the README file says the text is encoded in plain ASCII I get errors due to unreadable characters in your file. For example the line between aby\V and AB\N contains such an unreadable character. I also tried converting the file to ASCII, but the problems persist, which is strange.